My daughter is convinced she can tell the difference between M&M colors by their flavor. I’m not buying it. Since I’m a statistician, we devise the following two experiments in an attempt to investigate her claim.

Experiment 1

There are 6 M&M colors. In this experiment, I select one M&M of each color and give it to my daughter while she is blindfolded. She guesses the color of each M&M as she tastes it. We agree that after she guesses a color, she will not guess it again in that round. We perform this exercise for three rounds, so a total of 18 M&Ms are consumed. At the end, we count the number of colors she correctly determined out of 18.

Experiment 2

In this version, I randomly draw 18 M&Ms from the bag, without regard to color, and once again provide them to my blindfolded daughter. She makes her best guess at the color of each M&M and at the conclusion we count the number of correctly determined M&Ms out of 18.

Given these experiments, we want to answer the following questions:

  1. When using \(\alpha = 0.05\) level of significance, what is the critical value for testing the null hypothesis that she is merely guessing (equal likelihood of selecting any given color)? What is the size of the test?

  2. Suppose my daughter can correctly distinguish blue M&Ms from the others, but if the M&M is not blue, she’s merely guessing. What is the statistical power of the testing procedure for this situation? Given the answer, which experiment would be preferred by a statistician (assuming both are equally convenient)?

  3. Suppose my daughter correctly identified 7 M&Ms. Is this sufficient evidence against the null hypothesis that she is merely guessing?

rr exp_1 <- function(h0 = TRUE) { max <- ifelse(h0, 6, 5) truth <- sample(1:max) guess <- sample(1:max) # How many were correctly guessed? sum(guess == truth) + !h0 }

rr sim_1 <- function(h0 = TRUE, n_rounds = 3, n_reps = 10000) { replicate(n_reps, sum(replicate(n_rounds, exp_1(h0)))) }

rr sim_2 <- function(h0 = TRUE, n_samples = 18, n_reps = 10000) { if (h0) { # Random guess (1/6 likelihood of success) rbinom(n_reps, n_samples, p = 1/6) } else { # How many blues are included - these are always correctly determined blues <- rbinom(n_reps, n_samples, p = 1/6)

# The ramaining m&ms after blues are accounted for are randomly guessed
others <- rbinom(n_reps, n_samples - blues, p = 1/5)
blues + others

} }

rr h0_dist_1 <- sim_1() ha_dist_1 <- sim_1(FALSE)

rr h0_dist_2 <- sim_2() ha_dist_2 <- sim_2(FALSE)

Now that we have these distributions, we can answer the questions.

rr alpha <- 0.05 # Critical value (crit_val_1 <- quantile(h0_dist_1, probs = 1 - alpha, type = 1))

95% 
  6 

rr # Size (probability of Type I error (false rejection of null hypothesis)) mean(h0_dist_1 >= crit_val_1)

[1] 0.0826

rr # Critical value (crit_val_2 <- quantile(h0_dist_2, probs = 1 - alpha, type = 1))

95% 
  6 

rr # Size (probability of Type I error (false rejection of null hypothesis)) mean(h0_dist_2 > crit_val_2)

[1] 0.0203

Statistical power is “the probability the test correctly rejects \(H_0\) for a given \(H_A\).”

rr prop_ci <- function(x) { est <- mean(x) ci <- qnorm(0.975) * sqrt(est * (1 - est) / length(x)) c( lower = est - ci, estimate = est, upper = est + ci ) }

rr (power_1 <- prop_ci(ha_dist_1 > crit_val_1))

   lower estimate    upper 
0.339459 0.348800 0.358141 

rr

power_2 <- prop_ci(ha_dist_2 >= crit_val_2)
power_2
    lower  estimate     upper 
0.3804403 0.3900000 0.3995597 

rr

# Number of M&Ms correctly identified
observed_stat <- 7

p_val_1 <- prop_ci(h0_dist_1 >= observed_stat)
p_val_1
     lower   estimate      upper 
0.02731899 0.03070000 0.03408101 

rr

p_val_2 <- prop_ci(h0_dist_2 >= observed_stat)
p_val_2
     lower   estimate      upper 
0.02015572 0.02310000 0.02604428 
LS0tCnRpdGxlOiAiTSZNIFN0dWR5IgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZSA9IEZBTFNFfQprbml0cjo6b3B0c19jaHVuayRzZXQoY2FjaGUgPSBUUlVFKQpgYGAKCk15IGRhdWdodGVyIGlzIGNvbnZpbmNlZCBzaGUgY2FuIHRlbGwgdGhlIGRpZmZlcmVuY2UgYmV0d2VlbiBNJk0gY29sb3JzIGJ5IHRoZWlyCmZsYXZvci4gSSdtIG5vdCBidXlpbmcgaXQuIFNpbmNlIEknbSBhIHN0YXRpc3RpY2lhbiwgd2UgZGV2aXNlIHRoZSBmb2xsb3dpbmcKdHdvIGV4cGVyaW1lbnRzIGluIGFuIGF0dGVtcHQgdG8gaW52ZXN0aWdhdGUgaGVyIGNsYWltLgoKIyMjIEV4cGVyaW1lbnQgMQpUaGVyZSBhcmUgNiBNJk0gY29sb3JzLiBJbiB0aGlzIGV4cGVyaW1lbnQsIEkgc2VsZWN0IG9uZSBNJk0gb2YgZWFjaCBjb2xvciBhbmQgCmdpdmUgaXQgdG8gbXkgZGF1Z2h0ZXIgd2hpbGUgc2hlIGlzIGJsaW5kZm9sZGVkLiBTaGUgZ3Vlc3NlcyB0aGUgY29sb3Igb2YgZWFjaApNJk0gYXMgc2hlIHRhc3RlcyBpdC4gV2UgYWdyZWUgdGhhdCBhZnRlciBzaGUgZ3Vlc3NlcyBhIGNvbG9yLCBzaGUgd2lsbCBub3QgZ3Vlc3MKaXQgYWdhaW4gaW4gdGhhdCByb3VuZC4gV2UgcGVyZm9ybSB0aGlzIGV4ZXJjaXNlIGZvciB0aHJlZSByb3VuZHMsIHNvIGEgdG90YWwgb2YgCjE4IE0mTXMgYXJlIGNvbnN1bWVkLiBBdCB0aGUgZW5kLCB3ZSBjb3VudCB0aGUgbnVtYmVyIG9mIGNvbG9ycyBzaGUgY29ycmVjdGx5IApkZXRlcm1pbmVkIG91dCBvZiAxOC4KCiMjIyBFeHBlcmltZW50IDIKSW4gdGhpcyB2ZXJzaW9uLCBJIHJhbmRvbWx5IGRyYXcgMTggTSZNcyBmcm9tIHRoZSBiYWcsIHdpdGhvdXQgcmVnYXJkIHRvIGNvbG9yLCAKYW5kIG9uY2UgYWdhaW4gcHJvdmlkZSB0aGVtIHRvIG15IGJsaW5kZm9sZGVkIGRhdWdodGVyLiBTaGUgbWFrZXMgaGVyIGJlc3QgZ3Vlc3MKYXQgdGhlIGNvbG9yIG9mIGVhY2ggTSZNIGFuZCBhdCB0aGUgY29uY2x1c2lvbiB3ZSBjb3VudCB0aGUgbnVtYmVyIG9mIGNvcnJlY3RseQpkZXRlcm1pbmVkIE0mTXMgb3V0IG9mIDE4LgoKR2l2ZW4gdGhlc2UgZXhwZXJpbWVudHMsIHdlIHdhbnQgdG8gYW5zd2VyIHRoZSBmb2xsb3dpbmcgcXVlc3Rpb25zOgoKMS4gV2hlbiB1c2luZyAkXGFscGhhID0gMC4wNSQgbGV2ZWwgb2Ygc2lnbmlmaWNhbmNlLCB3aGF0IGlzIHRoZSBjcml0aWNhbCB2YWx1ZSAKZm9yIHRlc3RpbmcgdGhlIG51bGwgaHlwb3RoZXNpcyB0aGF0IHNoZSBpcyBtZXJlbHkgZ3Vlc3NpbmcgKGVxdWFsIGxpa2VsaWhvb2Qgb2YKc2VsZWN0aW5nIGFueSBnaXZlbiBjb2xvcik/IFdoYXQgaXMgdGhlIHNpemUgb2YgdGhlIHRlc3Q/CgoyLiBTdXBwb3NlIG15IGRhdWdodGVyIGNhbiBjb3JyZWN0bHkgZGlzdGluZ3Vpc2ggYmx1ZSBNJk1zIGZyb20gdGhlIG90aGVycywgYnV0CmlmIHRoZSBNJk0gaXMgbm90IGJsdWUsIHNoZSdzIG1lcmVseSBndWVzc2luZy4gV2hhdCBpcyB0aGUgc3RhdGlzdGljYWwgcG93ZXIgb2YgCnRoZSB0ZXN0aW5nIHByb2NlZHVyZSBmb3IgdGhpcyBzaXR1YXRpb24/IEdpdmVuIHRoZSBhbnN3ZXIsIHdoaWNoIGV4cGVyaW1lbnQKd291bGQgYmUgcHJlZmVycmVkIGJ5IGEgc3RhdGlzdGljaWFuIChhc3N1bWluZyBib3RoIGFyZSBlcXVhbGx5IGNvbnZlbmllbnQpPwoKMy4gU3VwcG9zZSBteSBkYXVnaHRlciBjb3JyZWN0bHkgaWRlbnRpZmllZCA3IE0mTXMuIElzIHRoaXMgc3VmZmljaWVudCBldmlkZW5jZSAKYWdhaW5zdCB0aGUgbnVsbCBoeXBvdGhlc2lzIHRoYXQgc2hlIGlzIG1lcmVseSBndWVzc2luZz8KCmBgYHtyfQpleHBfMSA8LSBmdW5jdGlvbihoMCA9IFRSVUUpIHsKICBtYXggPC0gaWZlbHNlKGgwLCA2LCA1KQogIHRydXRoIDwtIHNhbXBsZSgxOm1heCkKICBndWVzcyA8LSBzYW1wbGUoMTptYXgpCiAgIyBIb3cgbWFueSB3ZXJlIGNvcnJlY3RseSBndWVzc2VkPwogIHN1bShndWVzcyA9PSB0cnV0aCkgKyAhaDAKfQpgYGAKCmBgYHtyfQpzaW1fMSA8LSBmdW5jdGlvbihoMCA9IFRSVUUsIG5fcm91bmRzID0gMywgbl9yZXBzID0gMTAwMDApIHsKICByZXBsaWNhdGUobl9yZXBzLCBzdW0ocmVwbGljYXRlKG5fcm91bmRzLCBleHBfMShoMCkpKSkKfQpgYGAKCmBgYHtyfQpzaW1fMiA8LSBmdW5jdGlvbihoMCA9IFRSVUUsIG5fc2FtcGxlcyA9IDE4LCBuX3JlcHMgPSAxMDAwMCkgewogIGlmIChoMCkgewogICAgIyBSYW5kb20gZ3Vlc3MgKDEvNiBsaWtlbGlob29kIG9mIHN1Y2Nlc3MpCiAgICByYmlub20obl9yZXBzLCBuX3NhbXBsZXMsIHAgPSAxLzYpCiAgfSBlbHNlIHsKICAgICMgSG93IG1hbnkgYmx1ZXMgYXJlIGluY2x1ZGVkIC0gdGhlc2UgYXJlIGFsd2F5cyBjb3JyZWN0bHkgZGV0ZXJtaW5lZAogICAgYmx1ZXMgPC0gcmJpbm9tKG5fcmVwcywgbl9zYW1wbGVzLCBwID0gMS82KQogICAgCiAgICAjIFRoZSByYW1haW5pbmcgbSZtcyBhZnRlciBibHVlcyBhcmUgYWNjb3VudGVkIGZvciBhcmUgcmFuZG9tbHkgZ3Vlc3NlZAogICAgb3RoZXJzIDwtIHJiaW5vbShuX3JlcHMsIG5fc2FtcGxlcyAtIGJsdWVzLCBwID0gMS81KQogICAgYmx1ZXMgKyBvdGhlcnMKICB9Cn0KYGBgCgpgYGB7cn0KaDBfZGlzdF8xIDwtIHNpbV8xKCkKaGFfZGlzdF8xIDwtIHNpbV8xKEZBTFNFKQpgYGAKCmBgYHtyfQpoMF9kaXN0XzIgPC0gc2ltXzIoKQpoYV9kaXN0XzIgPC0gc2ltXzIoRkFMU0UpCmBgYAoKTm93IHRoYXQgd2UgaGF2ZSB0aGVzZSBkaXN0cmlidXRpb25zLCB3ZSBjYW4gYW5zd2VyIHRoZSBxdWVzdGlvbnMuCgpgYGB7cn0KYWxwaGEgPC0gMC4wNQojIENyaXRpY2FsIHZhbHVlCihjcml0X3ZhbF8xIDwtIHF1YW50aWxlKGgwX2Rpc3RfMSwgcHJvYnMgPSAxIC0gYWxwaGEsIHR5cGUgPSAxKSkKIyBTaXplIChwcm9iYWJpbGl0eSBvZiBUeXBlIEkgZXJyb3IgKGZhbHNlIHJlamVjdGlvbiBvZiBudWxsIGh5cG90aGVzaXMpKQptZWFuKGgwX2Rpc3RfMSA+PSBjcml0X3ZhbF8xKQpgYGAKCmBgYHtyfQojIENyaXRpY2FsIHZhbHVlCihjcml0X3ZhbF8yIDwtIHF1YW50aWxlKGgwX2Rpc3RfMiwgcHJvYnMgPSAxIC0gYWxwaGEsIHR5cGUgPSAxKSkKIyBTaXplIChwcm9iYWJpbGl0eSBvZiBUeXBlIEkgZXJyb3IgKGZhbHNlIHJlamVjdGlvbiBvZiBudWxsIGh5cG90aGVzaXMpKQptZWFuKGgwX2Rpc3RfMiA+PSBjcml0X3ZhbF8yKQpgYGAKClN0YXRpc3RpY2FsIHBvd2VyIGlzICJ0aGUgcHJvYmFiaWxpdHkgdGhlIHRlc3QgY29ycmVjdGx5IHJlamVjdHMgJEhfMCQgZm9yIGEgZ2l2ZW4KJEhfQSQuIgoKYGBge3J9CnByb3BfY2kgPC0gZnVuY3Rpb24oeCkgewogIGVzdCA8LSBtZWFuKHgpCiAgY2kgPC0gcW5vcm0oMC45NzUpICogc3FydChlc3QgKiAoMSAtIGVzdCkgLyBsZW5ndGgoeCkpCiAgYygKICAgIGxvd2VyID0gZXN0IC0gY2ksCiAgICBlc3RpbWF0ZSA9IGVzdCwKICAgIHVwcGVyID0gZXN0ICsgY2kKICApCn0KYGBgCgoKYGBge3J9Cihwb3dlcl8xIDwtIHByb3BfY2koaGFfZGlzdF8xID49IGNyaXRfdmFsXzEpKQpgYGAKCmBgYHtyLCBtZXNzYWdlID0gRkFMU0V9CmxpYnJhcnkodGlkeXZlcnNlKQp0aWJibGUobnVsbCA9IGgwX2Rpc3RfMSwKICAgICAgIGFsdCA9IGhhX2Rpc3RfMSkgJT4lIAogIGdhdGhlcigpICU+JSAKICBjb3VudChrZXksIHZhbHVlKSAlPiUgCiAgY29tcGxldGUoa2V5LCB2YWx1ZSkgJT4lIAogIGdncGxvdChhZXMoeCA9IHZhbHVlLCB5ID0gbiwgZmlsbCA9IGtleSkpICsKICBnZW9tX2NvbChwb3NpdGlvbiA9ICJkb2RnZSIpICsKICBnZW9tX3ZsaW5lKHhpbnRlcmNlcHQgPSBjcml0X3ZhbF8xLCBjb2wgPSAicmVkIiwgbHdkID0gMikgKwogIHNjYWxlX3lfY29udGludW91cyhsYWJlbHMgPSBzY2FsZXM6OmNvbW1hKSArCiAgdGhlbWVfYncoKSArCiAgdGhlbWUobGVnZW5kLnBvc2l0aW9uID0gImJvdHRvbSIpICsKICBsYWJzKHkgPSAiQ291bnQiLAogICAgICAgeCA9ICJDb3JyZWN0IE0mTXMiLAogICAgICAgZmlsbCA9ICJIeXBvdGhlc2lzIikKYGBgCgoKYGBge3J9CnBvd2VyXzIgPC0gcHJvcF9jaShoYV9kaXN0XzIgPj0gY3JpdF92YWxfMikKcG93ZXJfMgpgYGAKCmBgYHtyfQojIE51bWJlciBvZiBNJk1zIGNvcnJlY3RseSBpZGVudGlmaWVkCm9ic2VydmVkX3N0YXQgPC0gNwoKcF92YWxfMSA8LSBwcm9wX2NpKGgwX2Rpc3RfMSA+PSBvYnNlcnZlZF9zdGF0KQpwX3ZhbF8xCmBgYAoKYGBge3J9CnBfdmFsXzIgPC0gcHJvcF9jaShoMF9kaXN0XzIgPj0gb2JzZXJ2ZWRfc3RhdCkKcF92YWxfMgpgYGAK